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The positive effects of root-colonizing bacteria cooperating with plants lead to improved growth and/or 
health of their eukaryotic hosts. Some of these Plant Growth-Promoting Rhizobacteria (PGPR) display 
several plant-beneficial properties, suggesting that the accumulation of the corresponding genes could have 
been selected in these bacteria. Here, this issue was targeted using 23 genes contributing directly or 
indirectly to established PGPR effects, based on genome sequence analysis of 304 contrasted Alpha- Beta- 
and Gammaproteobacteria. Most of the 23 genes studied were also found in non-PGPR Proteobacteria and 
none of them were common to all 25 PGPR genomes studied. However, ancestral character reconstruction 
indicated that gene transfers -predominantly ancient- resulted in characteristic gene combinations 
according to taxonomic subgroups of PGPR strains. This suggests that the PGPR-plant cooperation could 
have established separately in various taxa, yielding PGPR strains that use different gene assortments. The 
number of genes contributing to plant-beneficial functions increased along the continuum -animal 
pathogens, phytopathogens, saprophytes, endophytes/symbionts, PGPR- indicating that the accumulation 
of these genes (and possibly of different plant-beneficial traits) might be an intrinsic PGPR feature. This 
work uncovered preferential associations occurring between certain genes contributing to phytobeneficial 
traits and provides new insights into the emergence of PGPR bacteria. 



Plant roots host a large variety of bacteria, many of them cooperating with the plant and enhancing plant 
nutrition, stress tolerance or health'. Several different modes of action are documented in these Plant 
Growth-Promoting Rhizobacteria (PGPR). Direct effects on plants may involve enhanced availability of 
nutriments^ ', stimulation of root system development via production of phytohormones and other signals'* or 
interference with plant's ethylene synthesis^ '", and/or induced systemic resistance'. Indirect beneficial effects of 
PGPR on plants entail competition or antagonism towards phytoparasites'*'*'. 

Despite extensive literature on PGPR's modes of action (especially in the Proteobacteria), the molecular 
features that define a PGPR remain elusive, because the PGPR status is not always well defined. First, PGPR 
may occupy different microbial habitats, as they range from saprophytic soil bacteria that colonize the rhizo- 
sphere to bacteria that can also colonize internal root tissues. This means that the distinction is not often simple 
respectively with saprophytes without plant-beneficial effects (especially plant commensals) and with vertically- 
inherited endophytes or plant endosymbionts. Second, several bacteria display alternate ecological niches, and at 
times some may function as PGPR. For instance, certain tumor- inducing Agrobacterium strains have plant 
growth stimulation potential on non-susceptible plant hosts'", a property also found in an Escherichia coli gut 
commensal'". Third, the genes implicated in plant-beneficial functions range from genes directly conferring 
plant-beneficial properties, such as n;/ (nitrogen fixation)" or phi (phloroglucinol synthesis)'"', to genes contrib- 
uting to a variety of cell functions indirectly or secondarily including plant-beneficial ones, such as pqq (pyrro- 
loquinoline quinone synthesis) '\ Fourth, many PGPR strains are not yet recognized as such (as determination of 
PGPR status requires experimental assessment), and it is very likely that not all plant-beneficial traits and the 
corresponding genes have already been identified. Fifth, the assessment of genes encoding plant-beneficial 
properties is commonly restrained to particular bacterial clades'* if not particular PGPR strains'^ '^, without a 
more general analysis of gene distribution across several bacterial clades'^. 
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Table 1 | Distribution of plant-beneficial function contributing (PBFC) genes according to the primary ecological lifestyle documented for the 
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Gene 


PCiPR 


( joy 


179) 
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Phosphate solubilization 


pqqB 


20 


36 


26 


34 


13 




pqqC 


20 


36 


26 


35 


13 




pqqD 


20 


36 


26 


35 


13 




pqqE 


20 


36 


26 


35 


13 




pqqF 


10 


17 


8 


16 


7 




pqqG 


7 


17 


9 


13 


4 


2,4-Diacetylphloroglucinol synthesis 


phiA 


3 


0 


0 


0 


0 




phiB 


3 


0 


0 


0 


0 




phIC 


3 


0 


0 


0 


0 




phiD 


3 


0 


0 


0 


4 


Hydrogen cyanide synthesis 


hcnA 


3 


9 


2 


0 


4 




hcnB 


3 


9 


2 


0 


4 




hcnC 


3 


9 


2 


0 


4 


Acetoine/2,3-butanediol synthesis 


budA 


5 


2 


3 


14 


5 


budB 


5 


2 


3 


14 


5 




budC 


1 1 


12 


4 


10 


5 


Nitric oxide synthesis 


nirK 


6 


14 


1 


1 


108 


Auxin synthesis 


ipdC 


5 


2 


3 


10 


5 




ppdC 


2 


2 


0 


0 


0 


ACC deamination 


acdS 


9 


31 


16 


26 


44 


Nitrogen fixation 


nifD 


9 


23 


3 


3 


0 


nifH 


9 


23 


3 


3 


0 




nifK 


9 


23 


3 


3 


0 


°The number of bacteria is indicated in parenthesis. 



Despite these limitations, however, a number of emblematic PGPR 
model strains have been extensively characterized over the last 20 
years, uncovering the molecular basis of at least some of their plant- 
beneficial effects. These studies have evidenced that many PGPR 
strains typically harbor more than one plant-beneficial property*'"', 
and it could be hypothesized that the accumulation of genes contrib- 
uting (whether directly or indirectly) to plant-beneficial traits has 
been selected by the interaction of these bacteria with plants. On this 
basis, it could even be expected that PGPR might be identified by 
their particular assortment of genes contributing to plant-beneficial 
functions. So far, a more general description of the occurrence of 
these genes, including in bacteria not interacting with plants, is still 
lacking. Such knowledge would bring fundamental insights into the 
potential associations of phytobeneficial traits in PGPR bacteria, and 
this can now be achieved based on genome comparisons and phylo- 
genetic analyses"'"*. 

Hence, our objective was to assess the distribution of 23 genes 
contributing to eight key plant-beneficial functions using genomic 
and phylogenetic analyses, as well as ancestral state character recon- 
struction to infer possible gene transfers. These plant-beneficial func- 
tion contributing genes (hereafter referred to as PBFC genes) were 
investigated using the genomes of 25 emblematic proteobacterial 
PGPR (i.e. bacteria colonizing root surface and/or tissues and 
displaying plant growth-promotion effects). These genomes 
were also compared with those of 279 other Alpha-, Beta- and 
Gammaproteobacteria representing various taxonomic groups and 
ecological status, such as (i) endophytes/symbionts (i.e. asympto- 
matic, endophytic bacteria possibly in symbiotic interaction with 
the plant but for which plant-beneficial effects are not documented, 
as well as root-nodulating, diazotrophic bacteria), (ii) saprophytes 
(i.e. bacteria from various environments including soil; some of them 
possibly colonizing roots but without established plant-beneficial 
effects), (iii) plant pathogens and (iv) animal pathogens. 

The 23 genes selected included (i) the nitrogenase-encoding genes 
nifHDK responsible for nitrogen fixation in proteobacterial PGPR 
horn Azospirillum^\ Burkholderia" and other genera, (ii) the pyrro- 
loquinoline quinone-encoding genes pqqBCDEFG contributing to 



mineral phosphate solubilization in the PGPR Pseudomonas fluor- 
escens F113'''', Erwinia herbicola^^ and Enterobacter intermedium''^, 
(iii) the indole-3-pyruvate decarboxylase/phenylpyruvate decarbox- 
ylase gene ipdClppdC of the indole-3-pyruvate pathway for synthesis 
of the auxinic phytohormone indole acetic acid (lAA) in 
Azospirillum brasilense Sp245'\ Enterobacter cloacae and 
other Enterobacteriaceae PGRR^"*, (iv) the copper nitrite reductase 
gene nirK leading to formation of the NO root-branching signal in 
Azospirillum brasilense Sp245^\ (v) the 1-aminocyclopropane-l- 
carboxylate (ACC) deaminase gene acdS in Pseudomonas putida 
GR12-2-'' and various other Pseudomonas PGPR'', which enables 
degradation of the plant's ethylene precursor, (vi) the acetoine genes 
budAB and 2,3-butanediol gene budC (induced systemic resistance) 
in the PGPR Enterobacter sp. 638^^, and (vii) genes hcnABC 
(hydrogen cyanide) and phlACBD (2,4-diacetylphloroglucinol) for 
synthesis of antimicrobial compounds in P. fluorescens F113, P. pro- 
tegens CHAO and many other PGPR pseudomonads'^. 

Results 

Contrasted co-occurrence patterns of PBFC genes in proteobac- 
terial PGPR. In the 25 sequenced PGPR strains, which belonged 
to the genera Azospirillum, RhizobiumI Agrobacterium (Alphapro- 
teobacteria), Azoarcus, Burkholderia, (Betaproteobacteria), and 
Enterobacter, Klebsiella, Pantoea, Pseudomonas, Serratia, (Gamma- 
proteobacteria), the PBFC genes were found in 2 (for gene ppdC) to 
20 (pqqBCDE) of the genomes (Table 1). The PGPR strains harbored 
from 1 (i.e. acdS in Burkholderia 'cepacia 383 and B. phytofirmans 
PSJN) to 14 of the 23 PBFC genes studied (in P. protegens Pf-5, P. 
brassicacearum 'NVM421 and P. _/Zuore5cens F113), which gave 7.5 ± 
3.1 PBFC genes per strain (Supplementary Fig. Sla). The exact test of 
Fisher (P < 0.05) evidenced that phlACBD and hcnABC significantly 
occurred together in certain PGPR strains (Fig. 1 ) i.e. pseudomonads. 
Three other separate groups of co-occurring genes were identified, 
i.e. budAB and ipdC, the operon nifHDK and the clustered genes 
pqqBCDE. No other significant co-occurrence of PBFC genes was 
found. 
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Similar or lower prevalence of PBFC genes in Proteobacteria of 
other ecological types. The genomes of 279 other sequenced 
Proteobacteria corresponding to saprophytes or endophytes/ 
symbionts without estabhshed PGPR status, as well as pathogens 
of plants or animals, were studied as well. For the 56 endophytes/ 
symbionts, PBFC genes were found in 0 (for phlACBD) to 36 
(pqqBCDE) of the genomes (Table 1). Whereas two bacteria did 
not display any of the 23 PBFC genes, they were extensively found 
in others, with eight strains exhibiting as many as 10 PBFC genes 
each. Overall, the endophytes/symbionts harbored 6.1 ± 2.6 PBFC 
genes per strain, but the difference with PGPR was not significant (P 
= 0.06). Exact-Fisher pairwise tests of the co-occurrence of PBFC 
genes (P < 0.05) revealed four groups, i.e. hcnABC and pqqBCDE 
linked by pqqFG genes, as well as nifliDK/acdS and budAB/ipdC 
further apart (Fig. 2a). 

Within the 29 saprophytes, PBFC genes were found in 0 (for 
phlACBD and ppdC) to 26 (pqqBCDE) of the genomes (Table 1). 
Although three bacterial strains showed none of the studied genes, 
one strain {Pantoea sp. At-9b) exhibited as many as 12 genes. 
Globally, saprophytic strains contained 5.5 ±1.8 genes per genomes. 
This is significantly lower than in PGPR (P < 0.05) but not different 
from endophytes/symbionts (P = 0.44). Co-occurrence analysis of 
PBFC genes in saprophytic bacteria evidenced five separate groups, 
i.e. hcnABC, pqqBCDE, pqqFG, nifHDK and budABC/ipdC (Fig. 2b). 

In the 59 phytopathogenic bacteria, PBFC genes were found in 0 
(for the 8 genes ppdC, phlACBD and hcnABC) to 35 (pqqCDE) of the 
genomes (Table 1). Whereas seven phytopathogens (Xylella sp. and 
Xanthomonas albilineans) did not contain any of the 23 PBFC genes, 
as many as 8 PBFC genes occurred in Erwinia and Pantoea species. 
This gave overall 4.3 ± 2.2 PBFC genes per strain, which was lower 
than in PGPR and endophytes/symbionts (P < 0.05) but not signifi- 
cantly lower than in saprophytes (P = 0.06). Exact-Fisher pairwise 
tests (P < 0.05) of the co-occurrence of PBFC genes revealed two 
independent groups, i.e. pqqBCDEFG linked to acdS via pqqG, and 
budABCIipdC with nifHDK (Fig. 2c). 

Most PBFC genes were not prevalent in the 135 animal pathogens. 
Except wj'rK present in 109 of them, the other PBFC genes were not 
often found (ranging from 4 genomes ior pqqG, phlD and hcnABC to 
44 genomes for acdS) or not found at all (nifliDK, ppdC, phlACB). 
The number of PBFC genes varied from 0 (in 9 animal pathogens) to 
9 (in 7 other animal pathogens), i.e. 1.8 ± 1.2 PBFC genes per strain 




Figure 1 Co-occurrence network of the PBFC genes for tlie 25 PGPR 
genomes. The genes are depicted with a colored circle according to their 
encoded function. Each co-occurrence is represented by an edge linking 
the corresponding genes and materialized by a line (based on Fisher exact 
test; P < 0.05). Several PBFC genes found in PGPR (i.e. pqqF, pqqG, budC, 
nirK, ppdC and acdS) did not display significant co-occurrence with any 
other(s). 



overall, which was lower than for all other ecological types (all P < 

0. 05). Exact- Fisher pairwise tests (P < 0.05) evidenced a single group 
comprised of three subgroups extensively linked with one another, 

1. e. budABCIipdC, pqqBCDEF and hcnABC! pqqGlphlD (Fig. 2d). 

Distribution of PBFC genes across all 304 proteobacterial genomes 
reveals taxonomic specificities. Whereas phlACB were only retrieved 
in PGPR (in 3 of 25 genomes), the other PBFC genes were recovered 
in bacteria of different ecological types. Many occurred in PGPR as 
well as in endophytes/symbionts, saprophytes and phytopathogens, 
especially pqqCDE (36 of 56, 26 of 29 and 35 of 59 genomes, 
respectively), and with a lower prevalence ipdC (2 of 56, 2 of 29 
and 10 of 59 genomes, respectively) and nifHDK (23 of 56, 3 of the 
29, and 3 of 59 genomes, respectively). In contrast, the hcnABC genes 
were retrieved in PGPR (3 of 25 genomes), saprophytes (2 of 29 
genomes), endophytes/symbionts (9 of 59 genomes) and animal 
pathogens (4 of 135 genomes), but were absent in plant pathogens. 

The distribution of certain PBFC genes according to bacterial 
ecological type could, at least in part, reflect taxonomic properties. 
This is indicated by the occurrence of PBFC genes in taxa restricted to 
a given ecological type (Fig. 3). In particular, ppdC was only retrieved 
in certain Azospirillum PGPR and Bradyrhizobium in the endophyte/ 
symbiont category. For many PBFC genes, however, their occurrence 
within a taxon was related to species/strain ecology. This was the case 
for phlACBD [Pseudomonas PGPR), hcnABC (all Pseudomonas types 
except phytopathogens), and nifliDK (mainly in PGPR and endo- 
phytes/symbionts from various proteobacterial taxa). The relation to 
ecology, if any, was not as strong for ipdC and budAB (Enterobac- 
teriaceae), acdS (all Burkholderiaceae considered and various 
Alphaproteobacteria and Gammaproteobacteria), nirK and pqq 
genes (various Proteobacteria corresponding to several ecological 
types). 

The comparison of the 304 genomes showed that, unexpectedly, 
PBFC genes previously described as clustered (even forming operons 
in many cases) were not necessarily found together in a same genome 
(Fig. 4). For instance, pqqPG were close to pqqBCDE in Pseudomonas 
(and a few other genera), whereas pqqBCDE occurred without pqqG 
(encoding a family-S9 peptidase) and especially pqqP (encoding a 
family-M16 peptidase) in most other Proteobacteria. Similar obser- 
vations were made ior phlD andphlACB, as well as budAB and budC. 
Yet, the groups revealed by exact-Fisher pairwise tests (P < 0.01) 
corresponded mainly to genes involved in a same function (Fig. 3). 
This analysis showed that hcnABC and phlD linked the other phi 
genes with the six pqq genes, themselves linked to budABCIipdC 
via nirK and to nifliDK. nifliDK were also linked, separately, to 
ppdC and to acdS. 

Distribution of PBFC genes is partly related to proteobacterial 
phylogeny. We assessed whether the distribution of PBFC genes 
exhibited significant phylogenetic signal, meaning that closely- 
related species have similar gene content. Fritz and Purvis D index 
analysis (Table 2) showed that distribution of the PBFC genes was 
significantly influenced by evolutionary relationships between 
proteobacterial species, as indicated by D scores significantly less 
than 1. The genes phlACBD, pqqFG, budABC, ipdC, ppdC and 
hcnABC showed a strong phylogenetic signal, while acdS, nifliDK 
and pqqBCDE showed weaker signals. 

Horizontal gene transfers had significant effects on PBFC gene 
distribution in Proteobacteria. When the impact of genome 
plasticity was assessed, by computing events of acquisitions and 
losses across proteobacterial species, no loss was detected for pqqF, 
phlACBD, ppdC and nirK (Table 3). On the contrary, a few losses 
were inferred for the other genes, ranging from 1 loss for pqqG, 
budABC, ipdC and hcnABC to 6 losses for pqqB. In comparison. 
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Figure 2 | Co-occurrence network of PBFC genes according to primary ecological classification of bacteria. The genes are depicted with a colored circle 
according to their encoded function. Each co-occurrence is represented by an edge linking the corresponding genes and materialized by a line. 
Computations were made for (a) endophytes/symbionts, (b) saprophytes, (c) phytoparasites, (d) animal pathogens. 



the number of acquisitions was of a larger scale, from 1 for ipdC and 
budAB to 21 for acdS (Table 3). 

All 23 genes appeared at least once in a distant ancestor of the 
species studied (Fig. 5). ipdCppdC and phlACBD are clade specific; 
ipdC appeared in the last common ancestor (LCA) of Pantoea and 
Erwinia genera, ppdC in the LCA of Azospirillum brasilense and the 
LCA of Bradyrhizobium strains ORS78 and BTAil, and phlACBD in 
the LCA of Pseudomonas fluorescens F113 and Pseudomonas brassi- 
cacearum NFM421 and the LCA of Pseudomonas protegens Pf-5. 
budABC appeared in the LCA of Enterobacteriaceae; budAB are 
clade specific but budC was acquired at least 7 times in other clades. 
The pqqBCDEFG genes appeared in the LCA of the Pseudomonas; 
pqqG, pqqP and pqqBCDE were acquired respectively 4, 5 and 15 
times by other taxa. At the extreme, nifliDK underwent at least 18 
acquisitions and acdS (which appeared in the Burkholderiaceae 
LCA) 2 1 acquisitions, in both cases across the three phyla considered. 

Discussion 

In this study, plant-beneficial properties of PGPR were for the first 
time assessed on a broad scale, by considering (i) a large range of 
PBFC genes corresponding to various types of plant-beneficial prop- 
erties, (ii) PGPR strains of contrasted taxonomic status (from the 
Alpha- Beta- and Gammaproteobacteria), and (ill) a selection of 
non-PGPR Proteobacteria with primarily other biotic relations with 
plants (i.e. endophytes/symbionts and phytopathogens) or other 
types of ecology (i.e. saprophytes and animal pathogens). 



It could have been thought that the PGPR status entailed the 
presence of a core collection of PBFC genes shared by all PGPR 
strains, but the current results based on 25 emblematic PGPR strains 
indicate that none of the 23 key PBFC genes of the study were com- 
mon to all strains, even though as many as 20 PGPR genomes dis- 
played pgqBCDE. PQQ is a co-factor potentially implicated in several 
cellular processes (and incidentally contributing also to phosphate 
solubilization), which may explain its wide occurrence in PGPR"'^**. 
In comparison with Proteobacteria of other lifestyle, PBFC genes 
restricted to PGPR were not found, except for phlACB but these 
genes were present in only 3 Pseudomonas PGPR. However, the 
number of PBFC genes increased along the continuum animal 
pathogens (only 1.8 PBFC genes/strain), phytopathogens, sapro- 
phytes, endophytes/symbionts, PGPR (as many as 7.5 PBFC genes/ 
strain). The same findings were made when assessing the number 
of functions expected from these PBFC genes, except that the differ- 
ence between animal and plant pathogens was not significant 
(Supplementary Fig. Sib). 

Our gene distribution data suggest that PBFC genes might by 
selected in plant- associated habitats and counter-selected elsewhere, 
as exemplified by the very low number of these genes in animal 
pathogens (where only nirK was prevalent). This is in accordance 
with the expectations that most of the corresponding functions 
would not be relevant for animal physiology and plant is not the 
primary habitat of these bacteria. For instance, nitrogen fixation is 
counter-selected in pathogenic bacteria^'''". In addition, results sug- 
gest that amongst all the plant- associated bacteria, specific lifestyle is 
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Figure 3 | Phylogenetic distribution of genes along Proteobacteria phylogeny. Internal circles: presence of a gene is indicated by a grey square and 
absence by a white square. TaxonomicaUy coherent groups with the same gene content were collapsed for sake of clarity. Biovars are indicated for 
Rhizobium leguminosarum and pathovars for Pseudomonas syringae. 



also a major factor explaining distribution of PBFC genes, with higher 
prevalence in plant-beneficial strains. This possibility stems in par- 
ticular from the comparison of (i) PGPR and endophytes/symbionts 
versus (ii) phytopathogens, despite the presence of PBFC homologs 
budABC, ipdC and/or acdS (not necessarily together; Fig. 2c) in many 
phytopathogens (Table 1). Indeed, many of the plant-beneficial traits 
found in PGPR could be used by endophytic Proteobacteria docu- 
mented (or presumably) in a mutualistic symbiosis with the plant 
host. This would be a generalization of previous observations made 
with nifHDK'^ and to a lesser extent acdS^'^. 

Most PBFC genes were identified in bacteria from different eco- 
logical types (Table 1), which is an indication that (i) strain informa- 



tion was not always sufficient to determine lifestyle precisely and/or 
(ii) boundaries between different lifestyles may not be very stringent 
in Proteobacteria. The first possibility is clear in the case of sapro- 
phytes, as this category contains a number of strains originating from 
bulk or rhizosphere soil but for which the PGPR potential has not 
been experimentally tested, raising the possibility that some of them 
could indeed be PGPR. Similarly, certain PGPR can also be endo- 
phytic, e.g. Azospirillum sp. B510 mdAzoarcus sp. BH72''', but some 
of the endophytes studied here have not been assessed for their effects 
on plants and so could not be listed among the PGPR. The second 
possibility is illustrated with many animal pathogens belonging to 
Pseudomonas^^ or the Enterobacteriaceae''' that can colonize plants 
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Figure 4 | Co-occurrence network of the PBFC genes for the 304 genomes. The genes are depicted with a colored circle according to their encoded 
function. Each co-occurrence is represented by an edge linking the corresponding genes and materialized by a hne. nirK does not appear in the figure 
because this gene did not shown any significant co-occurrence with other PBFC gene(s). 



asymptomatically, probably because these alternative hosts promote 
bacterial survival before recolonizing the next animal primary host 
This could explain why certain animal-associated strains displayed 
PBFC genes. Furthermore, opportunistic human pathogens such as 
Pseudomonas aeruginosa PAOl and PAH can also infect roots and 
lead to plant death'"". 

Mutational inactivation of a particular PBFC gene may reduce 
(without necessarily abolishing) plant-beneficial effects in PGPR 
strains'''', and genetic acquisition of an additional PBFC gene has 
the potential to enhance PGPR performance""". This indicates that 
possessing multiple PBFC genes should confer a better efficiency at 
enhancing plant growth. In this context, the analysis of co-occur- 
rence patterns (exact Fisher tests) can be useful to identify selection 
of multiple PBFC genes and their potential synergistic effects. 
However, gene co-occurrence may also take place because species 
that share a recent evolutionary history also share similar gene con- 
tents, a phenomenon known as phylogenetic signal. Indeed, the Fritz 
and Purvis index clearly pointed to gene associations related to 
phylogenetic signal, i.e. PBFC genes were more likely to be conserved 
in closely-related species. This also raises the possibility that the 



potential to become a PGPR may rely (at least in part) on ancestral 
features in the corresponding bacterial taxa, which is in phase with 
previous findings on particular PGPR populations" and more gen- 
erally on function distributions in Gammaproteobacteria*'. 

The distribution pattern of PBFC genes amongst Proteobacteria of 
various lifestyles and the relation to bacterial taxonomy prompted us 
to assess in more details the evolutionary history of these genes. 
Ancestral character reconstructions showed few losses of PBFC 
genes, even in animal-associated bacteria, but many more gene 
acquisitions. Indeed, the role of horizontal gene transfer has been 
substantiated with various types of PGPR''^ ''^ and suggests that coop- 
eration interactions between Proteobacteria and plant roots might 
have established separately in various taxa, yielding PGPR strains 
whose effect(s) on the plant may rely on different and taxa-specific 
combinations of modes of action. Further genome sequencing efforts 
targeting close relatives of these PGPR would be needed to confirm 
this possibility. Despite conservation of PBFC genes across different 
ecological lifestyles, a differential use/regulation of these genes 
depending on environmental and host conditions is likely*'', as can 
take place during exaptation''^. Indeed, expression patterns of PBFC 



Table 2 | Phylogenetic patterns of gene distribution in selected Proteobacteria. Values were calculated for the 1 000 partitions of the species 



phylogenetic tree 


Gene" 


£> 


P(D > Of= 


P(D< if 


Phylogenetic signal strength 


pqqB 


0.05 (0.01/0.07) 


0.16/0.41 


0 


Strong 


pqqC 


0.04 (0.00/0.07) 


0.16/0.45 


0 


Strong 


pqqD 


0.04 (0.00/0.07) 


0.16/0.45 


0 


Strong 


pqqE 


0.04 (0.01/0.07) 


0.17/0.45 


0 


Strong 


pqqF 


-0.17(-0.20/-0.15) 


0.95/0.99 


0 


Very strong 


pqqG 


-0.18 (-0.21 /-O.I 7) 


0.95/0.98 


0 


Very strong 


phiA 


-0.40 (-0.76/-0.06) 


0.48/0.88 


0.00/0.04 


Very strong 


phiB 


-0.40(-0.75/-0.05) 


0.49/0.89 


0.00/0.04 


Very strong 


phIC 


-0.40(-0.75/-0.06) 


0.48/0.88 


0.00/0.03 


Very strong 


phiD 


-0.24 (-0.36/-0.12) 


0.63/0.86 


0 


Very strong 


hcnA 


-0.18 (-0.30/-0.06) 


0.62/0.95 


0 


Strong 


hcnB 


-0.18 (-0.30/-0.05) 


0.61/0.95 


0 


Strong 


hcnC 


-0.18 (-0.29/-0.06) 


0.61/0.94 


0 


Strong 


budA 


-0.34 (-0.37/-0.30) 


0.98/0.99 


0 


Very strong 


buds 


-0.34 (-0.37/-0.30) 


0.98/0.99 


0 


Very strong 


budC 


-0.02 (-0.08/-0.03) 


0.40/0.70 


0 


Strong 


nirK 


-0.05 (-0.09/-0.02) 


0.60/0.89 


0 


Very strong 


ipdC 


-0.33 (-0.41/0.28) 


0.96/0.99 


0 


Very strong 


ppdC 


-0.34 (-0.27/-0.47) 


0.65/0.80 


0 


Very strong 


acdS 


0.10(0.07/0.13) 


0.05/0.19 


0 


Moderate 


nifD 


0.28 (0.23/0.33) 


0.01/0.05 


0 


Weak 


nifH 


0.28 (0.24/0.33) 


0.00/0.04 


0 


Weak 


nifK 


0.28 (0.23/0.33) 


0.00/0.04 


0 


Weak 



"The genes studied are involved in phosphate solubilization (pyrroloquinoline quinone; pqqBCDEFG], 2,4-diacetylphloroglucinol synthesis [phlACBD), hydrogen cyanide synthesis [hcnABQ, induced 
systemic resistance (acetoine and 2,3-butanediol; budAB and buc/C, respectively), NO synthesis (copper nitrite reductase; n/rK), lAA synthesis (indole-3-pyruvate decarboxylose/phenylpyruvote 
decarboxylase; ipdC/ppdQ, plant ethylene regulation (ACC deamination; acdS), and nitrogen fixation (nitrogenase; nifHDK]. 
''Median value (with the minimum and maximum values in parenthesis). 
^Minimum and maximum values when different. 
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genes according to taxonomic and/or lifestyle properties is an 
important ecological issue, which will deserve further research atten- 
tion. Bacterial adaptation to new niches is mainly dependent on 
genetic novelty'"'''^, which may entail gene acquisitions'"' or differ- 
ential regulation*". Many examples of traits conferring envir- 
onmental adaptation that were further co-opted as virulence factor 
are documented in human pathogens''*'. Similar processes are likely 
to have taken place in PGPR as well™. 

In conclusion, the comparison of taxonomically-contrasted pro- 
teobacterial PGPR with a wide range of related, non-PGPR bacteria 
suggested that the emergence of the PGPR status could have par- 
alleled accumulation of PBFC genes in root-adapted bacteria. It is 
likely that this process took place separately in taxonomically-con- 
trasted Proteobacteria and involved ancient gene acquisitions, which 
explains why subsequent diversification produced taxonomic sub- 
groups of PGPR strains differing from one another in the range of 
PBFC genes accumulated. 

Methods 

Selection of genomes. The genomes used were selected among those available in 
October 2012. They corresponded to 25 PGPR, 56 endophytes/symbionts {35 
endophytes and 21 root- nodula ting bacteria), 29 saprophytes (3 from water 
environments, 6 from bulk soil, 16 from the rhizosphere, and 4 from healthy animal 
samples), 59 plant pathogens and 135 animal pathogens (124 of them infecting 
humans). Since distribution can be influenced by phylogenetic relatedness, also called 
phylogenetic signal'^, genomes were chosen so as to balance the prevalence of the 
various Alpha-, Beta- and Gammaproteobacterial groups for which PGPR genomes 
were available, following two principles. First, the primary lifestyle of the selected 
bacteria had to be documented sufficiently clearly and their genomes were fully 
sequenced (except in a few cases for orders of particular interest). Second, bacterial 
orders in which PGPR representatives were available were assessed for genome 
availability of bacteria corresponding to other lifestyles (especially within the same or 
closely- related families/genera), and if unsuccessful the phylogenetically- closest 
order was then targeted. 

Homologs retrieval. Homologs of genes contributing to a phytobeneficial function in 
PGPR were retrieved using a BLAST-based method. A protein to protein search was 
done using Blastp-^ with a subset of genes documented to contribute to a given 
phytobeneficial function (Supplementary Table SI). As annotations in public 
databases may contain errors or sometimes fail to accurately predict gene identity, we 
then did a tblastn^^ search on genomic sequences to overcome these limitations. An E- 
value threshold of le-15 was set to filter blast searches. 

Protein family assignment. Assignment of homologous proteins to families having 
the same putative function was done using a combination of significant sequence 
identity (see above) and protein domain assignment. Protein domain assignment was 
done using rps-blast-'^ and the Conserved Domain Database (CDD)'^^. We separated 
the NCBI-curated domains (which are considered more accurate) and external 
sources domains in two distinct databases. The NCBI-curated database was 
preferentially used for protein domain assignments while external source database 
was used when the NCBI-curated one could not retrieve results. Proteins were 
considered of the same family if they (i) had at least 30% of identity on at least 70% of 
their respective protein sequence length and (ii) shared the same domains with a 
reference phytobeneficial protein. Phylogenetic profiles (corresponding to a binary 
vector with gene's presence and absence respectively indicated as 1 and 0 for each 
genome) were used to represent the presence/absence of a particular gene in the 
different organisms for analysis of phylogenetic signal and ancestral state 
reconstruction. 

Gene distribution. For statistical analysis of the number of PBFC genes and number 
of corresponding functions per genome, according to primary bacterial ecology, the 
Wilcoxon test was used with the R command wilcoxJest {P < 0.05). 

Proteobacterial phylogenetic tree. The proteobacterial phylogenetic tree was based 
on 31 housekeeping markers identified, aligned and trimmed with Amphora2, as 
done previously^^. Trees were inferred by ExaML^- with the concatenated alignment, 
1000 replicates and the PSR model of rate heterogeneity. 

Computation of phylogenetic signal. The phylogenetic signal for each gene was 
calculated using Fritz and Purvis's D index^^ implemented in the R package "caper". 
Computation of random and Brownian motion of evolution probabilities was based 
on 10,000 permutations. Briefly, a given trait (a gene in our case) displays a highly 
clustered distribution if D < 0, is as clustered as if it evolved under Brownian motion 




Pseudomonas sp, GM18- — 
Pseudomonas sp. GM67 
Pseudomonas sp. GM60— * 
pseudomonas sp. GM» 

pseudo'^ 



:oniC" 



,He.baspi-i«umserop€d.caeSniRl 
Herbaspirillum sp, YR522 
Cupriavidus larwanensis LMG19424 
«alslonia solanacearum GMI1000 



Figure 5 | Reconstruction of acquisitions and losses of PBFC genes in relation to evolutionary history of sequenced bacteria. When different strains of 
the same species had the same PBFC gene profile, only one representative strain was kept in the Maximum -Likelihood tree to avoid redundant 
information. Acquisitions are indicated by a blue arrow with a circle and losses by a red arrow with a triangle. 



if D = 0, displays random distribution if D = 1, and is overdispersed if D > 1. 
Comparison of D scores was used to arbitrarily infer the strength of the phylogenetic 
signal for each gene. 

Ancestral state character reconstruction. The GLOOME algorithm was used to infer 
the presence or absence of each gene on each node of a phylogenetic tree based on 
their distributions in terminal taxa. The phylogenetic tree used was computed as 
previously but was based on a filtered alignment. When many bacteria of the same 
species had the same content in genes of interest, only the reference species indicated 
in the NCBI database was conserved. This simplified the reconstruction model by 
removing redundant information. Reconstructions were made with the Maximum 



Parsimony method^^, which allows to reconstruct ancestral states by minimizing 
character change events along a phylogenetic tree. 
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